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1 Introduction and Results 

Nearly fifty years ago, R. Dobrushin proved in his thesis [2| a definitive central limit 
theorem (CLT) for Markov chains in discrete time that are not necessarily homogeneous 
in time. Previously, Markov, Bernstein, Sapagov, and Linnik, among others, had 
considered the central limit question under various sufficient conditions. Roughly, the 
progression of results relaxed the state space structure from 2 states to an arbitrary 
set of states, and also the level of asymptotic degeneracy allowed for the transition 
probabilities of the chain. 

After Dobrushin's work, some refinements and extensions of his CLT, some of which 
under more stringent assumptions, were proved by Statulevicius |16| and Sarymsakov 
[13] . See also Hanen [Hj in this regard. A corresponding invariance principle was also 
proved by Gudinas [I] . More general references on non- homogeneous Markov processes 
can be found in Isaacson and Madsen [7|, Iosifescu [Hj, Iosifescu and Theodorescu [S, 
and Winkler US]. 
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We now define what is meant by "degeneracy." Although there are many measures 
of "degeneracy," the measure which turns out to be most useful to work with is that 
in terms of the contraction coefficient. This coefficient has appeared in early results 
concerning Markov chains, however, in his thesis, Dobrushin popularized its use, and 
developed many of its important properties. [See Seneta ^1] for some history] 

Let 7r = ir(x,dy) be a Markov transition probability on (X, Z3(X)). Define the 
contraction coefficient S(tt) of tt as 

5(tt) = sup \ir(xi, A) — n(x2, A)\ 

sup | / f(y)[n(xi,dy) - n(x 2 ,dy)}\ 



2 /:|/l<i -/x 

Also, define the related coefficient a(iv) = 1 — S(tt). 

Clearly, < 5(tt) < 1, and <5(-7r) = if and only if ir(x,dy) is independent of x. 
It makes sense to call tt "non-degenerate" if < 5(tt) < 1. We use the standard 
convention and denote by [itt and iru the transformations induced by tt on countably 
additive signed measures and bounded measurable functions respectively, 



(fnr)(A) = J tt(x, A) fi(dx) and (ttu)(x) = \ u(y)Tr(x,dy) 

It is easy to see that 5(tt) has the following properties. 

5(tt) = sup \(ttu)(xi) - (7ru)(x 2 )| 



x l 1^2 
-U.&A 



where U = {u : sup yi y2 \u(yi) — u(y2)\ < 1}. It is the operator norm of tt with respect 
to the Banach (semi-) norm Osc(u) = sup xi X2 \u(xi) — u{x2)\, namely the oscillation 
of u. In particular, for any transition probabilities tti,tt2 we have 

5{tT\ TT2) < S(ni) S(lT2). (1.1) 

If fj, is a signed measure with //(X) = 0, 

llMllvar = sup|^(A)| = - sup | / f(x)fj,(dx)\ = sup| / u(x) fj,(dx)\. 
A * ||n|| £ oo<l J ueU J 

Therefore, by duality, for any two probability measures A and fionX, 

ll(A-^)7r||var < S(ir)\\X-ny„. (1.2) 

By a non-homogeneous Markov chain of length n on state space (X, B(X)) corre- 
sponding to transition operators {vr^j+i = TT iji+ i(x,dy) : 1 < i < n — 1} we mean the 
Markov process P on the product space (X n , £>(X n )), 

P[Xi+i e A\Xi = x}= ir i i+ i(x, A), 
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where {Xi : 1 < i < n} are the canonical projections. In particular, under the initial 
distribution X\ ~ n, the distribution at time k > 1 is ^1^2,3 • • • Kk-l,k- For % < j we 
will define 

7Ti,j = 7Ti ) i-(-X 7r i+l,i+2 ' ' ' 

We denote by E[Z] and V(Z) the expectation and variance of the random variable Z 
with respect to P. 

Consider now a non-homogeneous Markov chain on X with respect to transition 
operators {71^+1 : 1 < i < n — 1}. The following comparison of marginal distributions 
at time n starting from different initial conditions is an easy consequence of (jl.lj) and 

<E2). 



|A7Tl „. — fJ,TTi J|var < HA* — Hlvar ^(^1 • 



n-1 



< ||// - ^||var J\ 5(ir iti+ i). 



[1.3) 



i=i 



Dobrushin's theorem concerns the fluctuations of an array of non-homogeneous 
Markov chains. For each n > 1, let {x\ n ^ : 1 < i < n} be n observations of a non- 
homogeneous Markov chain on X with transition matrices {tt^ +1 = n^.^x, dy) : 1 < 
i < n — 1}. Let also 



a, 



mm a [ ir, 

Ki<n-1 



(n) > 



In addition, let {/j : 1 < i < n} be real valued functions on X. Define, for n > 1, 
the sum 

n 



i=i 



Theorem 1.1 Suppose that for some finite constants C n , 



sup sup |/^ n) (x)| < C n . 

x l<i<n 



Then, if 



lim C^a n 3 

n^oo 



4 = 1 



we have, regardless of the initial distribution, that 

S n — E[S n ] 



N(0,1) 



In general, the result is not true if condition is not met. 



[IA) 



(1.5) 



3 



In [2], Dobrushin also states the direct corollary which simplifies some of the as- 
sumptions. 

Corollary 1.1 When the functions are uniformly bounded, i.e. sup n C n = C < oo 
and the variances are bounded below, i.e. V(/j (X- )) > c> 0, for all 1 < i < n and 
n > 1, then we have the convergence M.5)) provided 

lim n 1 / 3 o n = oo. (1-6) 

n^oo 

We remark that in [2| (e.g. Theorems 3, 8) there are also results where the bound- 
ers) ' 

edness condition on f- is replaced by integrability conditions. As these results follow 
from truncation methods and Theorem 11.11 for bounded variables, we only consider 
Dobrushin's theorem in the bounded case. 

Also, for the ease of the reader, and to be complete, we will discuss in the next 
section an example, given in |2| and due to Dobrushin and Bernstein, of how the weak 
convergence ([1.5)1 may fail when the condition (|1.4|) is not satisfied. 

We now consider Dobrushin's methods. The techniques used in [2] to prove the 
above results fall under the general heading of the "blocking method." The condition 
1)1.4)1 ensures that well-separated blocks of observations may be approximated by in- 
dependent versions with small error. Indeed, in many remarkable steps, Dobrushin 
exploits the Markov property and several contraction coefficient properties, which he 
himself derives, to deduce error bounds sufficient to apply CLT's for independent vari- 
ables. However, in [2], it is difficult to see, even at the technical level, why condition 
(|1.4|) is natural. 

The aim of this note is to provide a different, shorter proof of Theorem 11.11 which 
explains more why condition 1)1.4(1 appears in the result. The methods are through 
martingale approximations and martingale CLT's which perhaps were not as codified 
in the early 1950's as they are today. These methods go back at least to Gordin 3 in 
the context of homogeneous processes, and has been used by others in other "related" 
situations (e.g. Kifer [Hlj; see also Pinsky There are three main ingredients in 

this approximation with respect to the non-homogeneous setting of Theorem ll.il (1) 
negligibility estimates for individual components, (2) a law of large numbers for con- 
ditional variances, and (3) lower bounds for the variance V(S n ). Negligibility bounds 
and a LLN are well known requirements for martingale CLT's (cf. Hall-Heyde ch. 
3]), and in fact, as will be seen, the sufficiency of condition (|1.4j) is transparent in the 
proofs of these two components (Lemma 14.2) and Lemmas 14.31 and 14.4)1 . The variance 
lower bounds which we will use were as well derived by Dobrushin in his proof. How- 
ever, using some martingale properties, we give a more direct argument for a better 
estimate. 

We note also, with this martingale approximation, that an invariance principle 
for the partial sums holds through standard martingale propositions, Hall-Heyde 
among other results. In fact, from the martingale invariance principle, it should be 
possible to derive Gudynas's theorems [I] although this is not done here. 
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We now explain the structure of the article. In section 2, we give the Bernstein- 
Dobrushin example of a Markov chain with anomalous behavior. In section 3, we 
discuss needed properties of the contraction coefficient. In section 4, we state the 
martingale CLT that will be utilized, and, as a preview of the non-homogeneous chain 
proof, we quickly reprise the argument with respect to homogeneous chains. In section 
5, we prove Theorem 11.11 with martingale approximation assuming a lower bound on 
the variance V(S n ). And last, in section 6, we prove this variance estimate. 

2 Anomalous Example 

Here, we summarize the example in Dobrushin's thesis, attributed to Bernstein, which 
shows that condition (|1.4|) is sharp. 

Example 2.1 Let X = {1, 2}, and consider the 2x2 transition matrices on X, 

The contraction coefficient 5(Q(p)) of Q(jp) is |1 — 2p\. Note that 5(Q(p)) = 5{Q{l—p)). 
The invariant measures for all the Q(p) are the same p(l) = p{2) = I. We will be 
looking at Q(p) for p close to or 1 and the special case of p = \. However, when 
p is small, the homogeneous chains behave very differently under Q{p) and Q(l — p). 
More specifically, when p is small there are very few switches between the two states 
whereas when 1 — p is small it switches most of the time. In fact, this behavior can 
be made more precise (see Dobrushin 1 , Hanen or from direct computation). Let 
T n = X^ILi l{i}C*^i) count the number of visits to state 1, say, in n steps. 

Case A. Consider the homogeneous chain under Q{p) with p = — and initial distri- 
bution p{\) = p(2) = \. Then, 

— G and lim rT 2 V(T n ) = Va < oo. (2.1) 

where G is a proper distribution supported on [0,1]. 

Case B. Consider the homogeneous chain run under Q(p) with p 
distribution p(l) = p{2) = |. Then, 

n + 1 

T n — => F and lim V(T n ) = V B < oc 

2 ri— >oo 

where F is a proper distribution function. 

Let a sequence a n — > with a n > be given . To construct the anomalous 
Markov chain, it will be helpful to split the time horizon [1,2,..., n] into roughly na n 
blocks of size a" 1 . We interpose a Q{\) between any two blocks that has the effect 



= 1 — — and initial 
(2.2) 
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of making the blocks independent of each other. More precisely let = ^[a^ 1 ] for 

1 < i < m n where m n = [^/[q" 1 ]]. Also, define k^ = 0, and k^ , 1 = n. 
Define now, for 1 < i < n, 

Q(a n ) fori = l,2,...,fci n) -l 

Qi^) f° r i = ki , /c 2 , . . . , k mn 
Q(l — a n ) for all other i. 

Consider the non-homogeneous chain with respect to {Trff +1 : 1 < i < n — 1} 
starting from equilibrium p(0) = p(l) = ^. From the definition of the chain, one 
observes, as Q{\) does not distinguish between states, that the process in time horizons 

{(k^ + 1, : < i < m n } are mutually independent. For the first time segment 1 

(n) 

to k\ , the chain is in regime A, while for the other segments, the chain is in case B. 

Once again, let us concentrate on the number of visits to state 1. Denote by 
r (n) = £™ =1 l {1} (xf°) and T( n \k,l) = T! i=k l{i}(X$ n) ) the counts in the first n 
steps and in steps k to / respectively. It follows from the discussion of independence 
above that 

T (n) = ^ T («)(^) + l (fc W) 
i=0 

is the sum of independent sub-counts where, additionally, the sub-counts for 1 < i < 
m n — 1 are identically distributed, the last sub-count perhaps being shorter. Also, as 
the initial distribution is invariant, we have V(l {1} (X^ n) )) = 1/4 for all i and n. Then, 
in the notation of Corollary II. H C = 1 and c = 1/4. 
From (|2,lj) . we have that 

V(T^(l,k[ n) )) ~ a~ 2 V A as n Too. 

Also, from (j2.2j) and independence of m n sub-counts, we have that 

V{T {n) {k[ n) + l,n)) ~ na n V B as n } 00. 

From these calculations, we see if n}l^a n — > 00, then a~ 2 << na n , and so the major 
contribution to is from T^ n \k^ n). However, since this last count is (virtually) 
the sum of m n i.i.d. sub-counts, we have that T^ n \ properly normalized, converges to 
N(0, 1), as predicted by Dobrushin's theorem ll.il 

On the other hand, if a n = ra -1 / 3 , we have a~ 2 = na n , and count T^ n \\, k^), in- 
dependent of T( n )(fc[ n \ n), also contributes to the sum T^ n K After appropriate scaling, 
then, T^™) approaches the convolution of a non-trivial non-normal distribution and a 
normal distribution, and therefore is certainly not Gaussian. 



vr (n) 
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3 Martingale CLT 



The central limit theorem for martingale differences is by now a standard tool. We 
quote the following (strong) form of the result implied by Corollary 3.1 in Hall and 
Heyde 0. 



Proposition 3.1 For each n > 1, let {(W, ,J-'a ; ) : < i < n} be a martingale 

relative to the nested family T\ 
their differences. Suppose that 



relative to the nested family C ft\ with W (n) = 0. Let ^ = W^ n) - W"/"J be 



maxi<j< n ||^ n ^|| L oo — > and 

Then, 

WW N(0,1). 

Note that the first and second limit conditions are the negligibility assumption on 
the sequence, and law of large numbers for conditional variances mentioned in the 
introduction. 

We now sketch a proof of Corollary II .11 in the case of a homogeneous Markov chain 
on a finite state space. Assume that we have a Markov chain with transition probability 
P on a finite state space X. If S(P) < 1, and / : X — > R is a function with mean 
with respect to the invariant distribution n on X, it is in the range of I — P and the 
equation (7 — P)u = f has a solution. The following argument is implicit in Gordin 
[H], and also explicitly used in Kipnis and Varadhan |llj . 

Using the relation E[u(Xj + \)\ Tj] = (Pu)(Xj), it is easy to check that 

fiX,) = u(Xj) - E[u(X, J+l )\^} = uiXj) - u(X j+1 ) + 

where 

£ j = u(X j )-E[u(X j )\F j - 1 ] 
is a martingale difference. Then, 

n— 1 n 
3=0 j=l 

If we define 

E\e 3+1 \H = <i(x 3 ) 

We will apply the martingale CLT (Proposition 13. l|) to the array formed from = 
Mi/y/n with differences £^ = £i/\/n. As the differences are uniformly bounded, 
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HQ < 2||u||ioc./- v /n, the first condition of Proposition ETT1 is satisfied. The second 

follows from the following computation. From the Markov property, 

4=1 4=1 4=0 

So, by the ergodic theorem, the last expression converges almost surely to Vb = 
E 7T [q(Xo)] < oo. It is not difficult to see that Vb > 0. 

Therefore, F(M n ) ~ nVb and (nVb)~ 1/2 M n => N(0, 1) by Proposition mi Since 
the difference 

n-l 

n"3|M n - ^/(X;)! < n-^llu^-uCJfoJIUoc 

j=0 

we have V(5 n ) ~ nV and S n /y/V(S n ) N(0, 1) also. □ 

4 Proof of Theorem 11.11 

We give here a short proof for Theorem 11.11 through martingale approximation, illus- 
trated for homogeneous chains in the previous section. Consider the non-homogeneous 
setting of Theorem 11.11 To follow the homogeneous argument, we will need to find 
the non-homogeneous analogue of the resolvent function "u = (/ — P) -1 /." To sim- 
plify notation, we will assume throughout that the functions {f^} are mean- zero, 
E [ft\ X i n) )\ = for 1 < i < n and n > 1. Define 

i=k 



where 



(„)_ j/i n) (4 n) )+Er=, + i^ n) (^ (n) )i4 n) ] ** i<k< n 



k ~\ft\Xn n) ) for k = n. (41) 

Remark 4.1 Before going further, we remark that indeed sequence {Z^} can be 
thought of as a generalization of the resolvent sequence {u(X k )} used in the case of 
a homogeneous chain. When the array {X^} is formed from the sequence {X}, 

in) 

fi = f for all i and n, and the chain is homogeneous, P n = P for all n, then indeed 
4 n) reduces to = f{X k )+Y£? (I* f){X h ) which approximates EZo( pi fK X k) = 
[(I - P)- 1 /]^) = u(X k ). See also p. 145-6 Varadhan 17 for other uses of {Z^}. 



Now, let us return to the full non-homogeneous setting of Theorem 11.11 By rear- 
ranging terms in (|4.1|) . we obtain for 1 < k < n — 1 

ft\4 n) ) = 4 n) - E[z ( ^\4 n) ] (4-2) 

= [zf - E[Z^\X^] + [E[ZP - |xf >]] . 

Then, we have the decomposition, 



k=l 
n 

= E[4 n) -M4 n) i4-\]] + ^ (n) (4.3) 

k=2 

and so in particular V(S n ) = Y2=2 V { Z f ) ~ E [ z ^ ] \ X t-i\) + V(z[ n) ). Let us now 
define the differences 

^ = -^[^-^Wj] (4.4) 

and the martingale Ml = X^f=2 ^ with respect to = a{X^ : 1 < I < k} for 
n > 1. The plan to obtain Theorem 1 1 . 1 1 will now be to approximate S n by Mn and use 
Proposition |23 Condition (J1.4|) will be a natural sumcent condition for "negligibility" 
( Lemma 14.2(1 and "LLN" (Xemmas 14.31 and I4.4|) with regard to Proposition 13. II 

Lemma 4.1 For 1 < i < j < n, we have the bound 

m f^(X^)\X^]\\ Lao < 2C n (l - a n y-\ 

Hence, for 1 < k < n, 

IIZ^Hl- < 2C n a-\ 
Proof. Since Wf^h^ < C n its oscillation Osc(/j n) ) < 2C n . From (jOl) . 
Osc(7r i , i /f ) ) < 2C n 6{ir itj ) < 2C n (l - a n y~ l 
Because E^ff)^)] = EiffiX,)} = 0, 

||vr M -/j n) || Lco < Os<'(~,,/;" : ; < 2C n (l-a„y- 4 



The second estimate now follows from this estimate. Indeed, 

n n 

= \EWri n) (4 n) )\X ( k n) }\ < 2C n £(l-a n ) fc - < < 2C n a~\ □ 

i=k i=k 

We now state a lower bound for the variance which will be proved in the next 
section using martingale ideas. We remark in [2] that actually the bound, V(S n ) > 
(q„/8) XXi v (fi n \ x< l n) ))> is found b y different methods (see also section 1.2.2 0). 
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Proposition 4.1 For n > 1, 

ns n ) > ^J2 v (ft\^ n) ))- (4.5) 



4 

i=l 



The next estimate shows that the asymptotics of S n / y/V(S n ) depend only on the 

(n) (n) 

martingale approximant M„ , and that the differences Q are negligible. 

Lemma 4.2 Under condition \1.4\) , we have that 

II z( n )|l r 
lim sup — -== = 0. 

"^KKn \/V{S n ) 

Proof. By Lemma 14. II and Proposition 14. H 

^ 4C n 



vWn) KEr =1 ^(/i n) (^ (n) ))) 1/2 ' 

The lemma follows now from (|1.4jl . □ 
We now turn to showing the LLN part of Proposition 13. II for the arrayjM^}. 

Lemma 4.3 Let {Y, : 1 < I < n} and {J 7 ^ ■ 1 < I < w}, /or n > 1, be respectively 

an array of non-negative variables and a-fields such that o~{Y^ n \ . . . , Y, } C J-^. 
Suppose that 



n 



lim E\y"Y (n >] = 1 and sup < e n 

where lim^—Hx, e n = 0. In addition, assume 

n 

lim sup Osc E\ V Yj- n) \rt n) ] = 0. 

»-»KK»-l •'111 J 
J=i+1 

T/ien, 

li m y^ (n) = l inL 2 . 

Proof. Write 

JT( 71 71 — 1 71 

E[C£ Y i n) f] = E E K Y i n) ) 2 ] + 2j>fr (n) ( £ 

1=1 1=1 1=1 3=1+1 

The first sum on the right-hand side is bounded as follows. From non-negativity, 

n n 

5>[C*i (B) ) 2 ] < e»5>[*i (n) ] = + -o as "Too. 

f=l f=l 
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Consider now the second sum. Write 

n— 1 n n—1 n 

2>[yW( e =E £ [ y / n)£ [ E y i n) ^ (n) ] 
j=i j=/+i (=1 j=/+i 

From the oscillation assumption, we have that 



s UP e if } iJf°]M--E[ E ii (ft) ] | = o(l)- 



K«n-1 w 



Therefore, 



2E^ (n) ( E y i n) )] = 2x;^ w ]^E i i W ]+°( 1 )-E^ w ] 

n n 

= (E^[ y i (B) ]) a -E £ [( y i (B) ) 2 ]+°( 1 ) 

J=l Z=l 

= l + o(l). 

Putting together these statements, we obtain the lemma. □ 
To apply this result to our situation, we will need the following oscillation estimate. 

Lemma 4.4 Let v\ n) = E[(^ n) ) 2 \x£\] and = a{x[ n \. . . ,xj n) } for 2 < I < n 
and 1 < j < n. Then, under condition \l-4\) , we have 



sup Osc E[ 53 vf ] \F\ n) ] =o(l). 

Kn—l ■ i I -i 

3=1+1 



Proof. From the martingale and Markov property, we haveE^l^ ^i n ^ \X^] = for 
r > s > u. Then, 

n n 

e[ e «W B) ] = ^[ E fe (n) ) 2 i^ (n) ] 



/ n \ 2 

£[( E ^ (n) i^ (n) ] 

v j=m 7 



^)-^[ ( E f} n \xM)-E[Z$\xV>]) 2 \X^] 

3=1+1 

n 

(l + o(l))V(S n )~'E[ ( ^ ffixf)) 2 \X™] +o(l) 

j=z+i 
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where we rewrite £^ with Q4.4|) in the third line, and use Lemma 14. II in the last line. 
Therefore, let us consider oscillations of 

n n 

E[( £ ff(X^)) 2 \xl n) ] = £ E[ff(X^)ft\x^)\xl% (4.6) 

j=l+l j,m=l+l 

From Lemma 14. II we have the bound, for j < m, 

\\E[f ( ;\xf ) )f^\x^)\X^]\\ L ^ = \\E[ff{xf )E[f^(X^)\X^]\X^]\\ Laa 

< ACl(l-a n y~ l (l-a n ) m -K 

Therefore, the oscillations of ()4.6j) are bounded by 16C^a~ 2 uniformly in I. Hence, 
using Proposition 14.11 we obtain 



sup Osc£[ £ v?>\jM] < (16)(4)^ alJ2v(ff\x( n) )) 

2<l<n— 1 



3=1+1 



1 

r(n)/ v (n)s 



which is o(l) by (fl~i|) . □ 

Proof of Theorem \l.l\ From Lemma 14.21 we need only show that / \/V(S n ) =^ 
N(0, 1). This will follow from martingale convergence (Proposition 13. 1|) as soon as 
we show (1) sup 2 < fc <J|£i n) ||L~ - and (2) ££ =2 E[(^f\^ n \] -+ 1. However, 
(1) follows from the negligibility estimate Lemma 14.21 and (2) from LLN Lemmas 14.31 

and 14.41 since "negligibility" (1) holds and X]fc=2 E[(C^) 2 ] = 1 + o(l) (from variance 
decomposition near (|4.3|) and Lemma |4.2|) . □ 

5 Proof of Variance Lower Bound 

In this section, we prove Proposition 14.11 

Lemma 5.1 Let f and g be measurable functions on (X, £>(X)). Let X be a probability 
measure on X x X with marginals a and (3 respectively. Let tt(xi, dx 2 ) and Tr(x 2 , dx\) 
be the transition probabilities in the two directions so that 

air = /3, (3tt = a 

If 



f{xi)a(dxi) = j g(x 2 )(3(dx 2 ) = 



then, 



f(xi)g(x 2 )\(dxi,dx 



2, 



< ||/||L 2 (a)|bllL 2 (/3) 
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Proof. Let us construct a measure on X x X x X by starting with A on X x X and 
using reversed ir(x2,dx3) to go from X2 to X3. The transition probability from x\ to 
X3 defined by 

Q(xi,A) = J ir(xi,dx 2 )n(x2,A) 

satisfies S(Q) < 6(ir). Moreover aQ = a and the operator Q is self adjoint and bounded 
with norm 1 on Lz{a). Then, if / is a bounded function with J f(x)a(dx) = (and so 
E a [Q n f] = 0), we have for n > 1, 

I|Q*7IIm«) < \\Q n f\\ Loo < (5(Q)) n Osc(f). (5.1) 

Hence, as bounded functions are dense, on the subspace of functions, M = {/ € £2(0) : 
f f{x)a{dx) = 0}, the top of the spectrum of Q is less than S(Q) and so || Q\\L 2 (a,M) — 
5(Q). Indeed, suppose the spectral radius of Q on M is larger than 5(Q) + e for e > 0, 
and / G M is a non-trivial bounded function whose spectral decomposition is with 
respect to spectral values larger than 5{Q)+e. Then, ||Q n /||L2( Q ) > ||/||L 2 (a)(^( ( 5)+ e ) n 
which contradicts the bound (jf),l[) when n f 00. [cf. Thm. 2.10 for a proof in 
discrete space settings.] 
Then, 

I^/IILO?) =<T7f/,/ >L 2 (a)=<Qf,f>L 2 (a)< 1 1 Q I U 2 («, A-X 1 1 / 1 U 2 («) < S(Q) || /||| 2 ( a) . 

Finally 

I J f(xi)g(x2)X(dx 1 ,dx 2 )\ = \ < nf,g >l 2 (j3) I < V^M ||/||i 2 (a) \\9\\l 2 ((3)- 

□ 

Lemma 5.2 Let f{x\) and 5(^2) &e square integrable with respect to a and (3 respec- 
tively. Then, 

E x [(f(x 1 )-g(x2)) 2 ] > a(vr) V (/(•)) 

as w;e/l as 

^[(/(xO-^2)) 2 ] > «(vr) V( fl (0) 

Proof. We can assume without loss of generality that / and g have mean with 
respect to a and (3 respectively. Then 



E x [(f( Xl )-g(x 2 )) 2 ] = E a [[f(x 1 )] 2 ] +E^[[g(x2)] 2 ] - 2E x [f(x 1 )g(x 2 )] 

>E^[[f(x 1 )] 2 ] +E^[[g(x2)] 2 } -27^)||/|| L2(Q) || 5 || i2(/3) 

>(l-^))H/llL(a) = ^)\\f\\l 2(a) 

The proof of the second half is identical. □ 
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Proof of Proposition^^ Applying Lemma 15.21 to the Markov pairs {(X^\ : 
1 < k < n - 1} with /(4 n) ) = E[Z<-%\XW] and g{X^ x ) = zQ v we get 

^[{Ztir ~ E[Z$ x \X^]f] > a n E[(Z^f] 
On the other hand from (|4.2|) . for 1 < k < n — 1, we have 

^(/i n) (4 n) )<^[(4 n) (4 n) )) 2 ] 

< 2 ^[(^) 2 ]+2^[(^[< ) 1 i4 n) ]) 2 ] 

<2^[(^) 2 ]+2^[(< ) i) 2 ] 

Summing over k, and noting variance decomposition near (|4.3|) . 



E^(/i n) (4 n) )<4E^[(4 n) ) 



fc=l 



fc=l 



< 



«7, 



n-1 



E^[(4i-^ii4 n) ]) 2 ]+^i 



(n) | v-(«)n 2 l 



r(")\2l 



fc=l 



ft, 



-V(S n ). 



□ 
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